Crowd-supervised training of spoken language systems
نویسنده
چکیده
Spoken language systems are often deployed with static speech recognizers. Only rarely are parameters in the underlying language, lexical, or acoustic models updated on-thefly. In the few instances where parameters are learned in an online fashion, developers traditionally resort to unsupervised training techniques, which are known to be inferior to their supervised counterparts. These realities make the development of spoken language interfaces a difficult and somewhat ad-hoc engineering task, since models for each new domain must be built from scratch or adapted from a previous domain. This thesis explores an alternative approach that makes use of human computation to provide crowd-supervised training for spoken language systems. We explore human-in-theloop algorithms that leverage the collective intelligence of crowds of non-expert individuals to provide valuable training data at a very low cost for actively deployed spoken language systems. We also show that in some domains the crowd can be incentivized to provide training data for free, as a byproduct of interacting with the system itself. Through the automation of crowdsourcing tasks, we construct and demonstrate organic spoken language systems that grow and improve without the aid of an expert. Techniques that rely on collecting data remotely from non-expert users, however, are subject to the problem of noise. This noise can sometimes be heard in audio collected from poor microphones or muddled acoustic environments. Alternatively, noise can take the form of corrupt data from a worker trying to game the system – for example, a paid worker tasked with transcribing audio may leave transcripts blank in hopes of receiving a speedy payment. We develop strategies to mitigate the effects of noise in crowd-collected data and analyze their efficacy. This research spans a number of different application domains of widely-deployed spoken language interfaces, but maintains the common thread of improving the speech recognizer’s underlying models with crowd-supervised training algorithms. We experiment with three central components of a speech recognizer: the language model, the lexicon, and the acoustic model. For each component, we demonstrate the utility of a crowd-supervised training framework. For the language model and lexicon, we explicitly show that this framework can be used hands-free, in two organic spoken language systems. Thesis Supervisor: Stephanie Seneff Title: Senior Research Scientist
منابع مشابه
Automating Crowd-supervised Learning for Spoken Language Systems
Spoken language systems often rely on static speech recognizers. When the underlying models are improved on-the-fly, training is usually performed using unsupervised methods. In this work, we explore an alternative approach that uses human computation to provide crowd-supervised training of a deployed system. Although the framework we describe is applicable to any stochastic model for which the...
متن کاملA Weakly Supervised Learning Approach for Spoken Language Understanding
In this paper, we present a weakly supervised learning approach for spoken language understanding in domain-specific dialogue systems. We model the task of spoken language understanding as a successive classification problem. The first classifier (topic classifier) is used to identify the topic of an input utterance. With the restriction of the recognized target topic, the second classifier (se...
متن کاملPowering Spoken Language Interactions With the Crowd
Spoken language interfaces (SLIs) have the potential to facilitate more natural interactions between people and computers, but realizing this potential requires spoken language interfaces to not only recognize the words people say but also the meaning and intent behind them. These problems are very difficult to solve individually and each must be solved for the other to perform optimally. As a ...
متن کاملGuardian: A Crowd-Powered Spoken Dialog System for Web APIs
Natural language dialog is an important and intuitive way for people to access information and services. However, current dialog systems are limited in scope, brittle to the richness of natural language, and expensive to produce. This paper introduces Guardian, a crowdpowered framework that wraps existing Web APIs into immediately usable spoken dialog systems. Guardian takes as input the Web AP...
متن کاملFor a fistful of dollars: using crowd-sourcing to evaluate a spoken language CALL application
We present an evaluation of a Web-deployed spoken language CALL system, carried out using crowd-sourcing methods. The system, “Survival Japanese”, is a crash course in tourist Japanese implemented within the platform CALL-SLT. The evaluation was carried out over one week using the Amazon Mechanical Turk. Although we found a high proportion of attempted scammers, there was a core of 23 subjects ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012